Document Reconstruction by Layout Analysis of Snippets
Identifieur interne : 000786 ( Main/Exploration ); précédent : 000785; suivant : 000787Document Reconstruction by Layout Analysis of Snippets
Auteurs : Florian Kleber ; Markus Diem ; Robert Sablatnig [Autriche]Source :
- Proceedings of SPIE, the International Society for Optical Engineering [ 0277-786X ] ; 2010.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000171
- to stream PascalFrancis, to step Curation: 000606
- to stream PascalFrancis, to step Checkpoint: 000160
- to stream Main, to step Merge: 000791
- to stream Main, to step Curation: 000786
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Document Reconstruction by Layout Analysis of Snippets</title>
<author><name sortKey="Kleber, Florian" sort="Kleber, Florian" uniqKey="Kleber F" first="Florian" last="Kleber">Florian Kleber</name>
</author>
<author><name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
</author>
<author><name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institute of Computer Aided Automation, Vienna University of Technology, Favoritenstr. 9</s1>
<s2>1040 Vienna</s2>
<s3>AUT</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<placeName><region type="land" nuts="2">Vienne (Autriche)</region>
<settlement type="city">Vienne (Autriche)</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">10-0398506</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0398506 INIST</idno>
<idno type="RBID">Pascal:10-0398506</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000171</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000606</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000160</idno>
<idno type="wicri:doubleKey">0277-786X:2010:Kleber F:document:reconstruction:by</idno>
<idno type="wicri:Area/Main/Merge">000791</idno>
<idno type="wicri:Area/Main/Curation">000786</idno>
<idno type="wicri:Area/Main/Exploration">000786</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Document Reconstruction by Layout Analysis of Snippets</title>
<author><name sortKey="Kleber, Florian" sort="Kleber, Florian" uniqKey="Kleber F" first="Florian" last="Kleber">Florian Kleber</name>
</author>
<author><name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
</author>
<author><name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>Institute of Computer Aided Automation, Vienna University of Technology, Favoritenstr. 9</s1>
<s2>1040 Vienna</s2>
<s3>AUT</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<placeName><region type="land" nuts="2">Vienne (Autriche)</region>
<settlement type="city">Vienne (Autriche)</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint><date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms</term>
<term>Image analysis</term>
<term>Imagery</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Analyse image</term>
<term>Imagerie</term>
<term>Algorithme</term>
<term>0130C</term>
<term>4230</term>
<term>Réserve</term>
<term>Méthode</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.</div>
</front>
</TEI>
<affiliations><list><country><li>Autriche</li>
</country>
<region><li>Vienne (Autriche)</li>
</region>
<settlement><li>Vienne (Autriche)</li>
</settlement>
</list>
<tree><noCountry><name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
<name sortKey="Kleber, Florian" sort="Kleber, Florian" uniqKey="Kleber F" first="Florian" last="Kleber">Florian Kleber</name>
</noCountry>
<country name="Autriche"><region name="Vienne (Autriche)"><name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
</region>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000786 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000786 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:10-0398506 |texte= Document Reconstruction by Layout Analysis of Snippets }}
This area was generated with Dilib version V0.6.32. |